Skip to content

Conversation

@arturaz
Copy link
Contributor

@arturaz arturaz commented Jun 21, 2025

This PR contains multiple fixes.

  1. macOS FSEventsWatcher was modified to use the new dispatch queue-based API instead of the old, deprecated run-loop based API. This made closing the watcher easier, without any class-level state to maintain, making sure resource release is always done appropriately.
  2. Invoking watcher.close() is now threadsafe and idempotent.
  3. Invoking os.watch now requires the directory to exist, as a sentinel file is written to that directory to make sure os.watch is actually watching the directory before it returns. An exception is thrown if the directory does not exist.

Tested with tests added to os/watch/test/src/WatchTests.scala.

These tests work fine in current main branch as well:

[238] + test.os.watch.WatchTests.emptyFolder.singleFileChangeManyTimes 4558ms  
[238] + test.os.watch.WatchTests.manyFiles.inManyFoldersSmall 2068ms  
[238] + test.os.watch.WatchTests.manyFiles.inManyFoldersMedium 3369ms  
[238] + test.os.watch.WatchTests.manyFiles.inManyFoldersLarge 4056ms  
[238] + test.os.watch.WatchTests.manyFiles.inManyFoldersLargest 15ms  
[238] + test.os.watch.WatchTests.manyFiles.inManyFoldersThreaded 28817ms  
[238] + test.os.watch.WatchTests.manyFiles.inManyFoldersThreadedSequential 26746ms  
[238] + test.os.watch.WatchTests.openClose.once 15ms

These tests reliably exhibit the JVM crash in main branch:

arturaz@Arturass-MacBook-Air os-lib % ./mill -w 'os.watch.jvm[3.3.5].test.test' test.os.watch.WatchTests.emptyFolder.doesNotLeaveSentinel
[238/238] os.watch.jvm[3.3.5].test.test
[238] --- Running Tests test.os.watch.WatchTests.emptyFolder.doesNotLeaveSentinel ---
[238] #
[238] # A fatal error has been detected by the Java Runtime Environment:
[238] #
[238] #  SIGSEGV (0xb) at pc=0x000000018409957c, pid=41664, tid=3843
[238] #
[238] # JRE version: OpenJDK Runtime Environment Temurin-21.0.1+12 (21.0.1+12) (build 21.0.1+12-LTS)
[238] # Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (21.0.1+12-LTS, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
[238] # Problematic frame:
[238] # C  [CoreFoundation+0x15057c]  __CFCheckCFInfoPACSignature+0x4
[238] #
[238] # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
[238] #
[238] # An error report file with more information is saved as:
[238] # /Users/arturaz/work/os-lib/out/os/watch/jvm/3.3.5/test/test.dest/sandbox/hs_err_pid41664.log
[238] #
[238] # If you would like to submit a bug report, please visit:
[238] #   https://github.com/adoptium/adoptium-support/issues
[238] # The crash happened outside the Java Virtual Machine in native code.
[238] # See problematic frame for where to report the bug.
[238] #

These tests either fail in main branch because the behaviour there is different:

[238] ----------- Running Tests test.os.watch.WatchTests.nonExistentFolder -----------
[238] X test.os.watch.WatchTests.nonExistentFolder 213ms 
[238]   utest.AssertionError: _root_.os.watch.watch(Seq(wd / "does-not-exist"), onEvent = _ => ())
[238]   wd: os.Path = /Users/arturaz/work/os-lib/out/os/watch/jvm/3.3.5/test/test.dest/sandbox/out/scratch/watch/WatchTests/tests/nonExistentFolder
[238]     utest.asserts.Util$.assertError(Util.scala:20)
[238]     utest.asserts.Asserts$.interceptImpl(Asserts.scala:49)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$1$$anonfun$1(WatchTests.scala:60)
[238]     test.os.TestUtil$.prep$$anonfun$1(TestUtil.scala:92)
[238]     test.os.TestUtil$.mkDir(TestUtil.scala:64)
[238]     test.os.TestUtil$.prep(TestUtil.scala:93)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$1(WatchTests.scala:60)
[238] Tests: 1, Passed: 0, Failed: 1

or randomly crash:

[238] -------------------- Running Tests test.os.watch.WatchTests --------------------
[238] X test.os.watch.WatchTests.nonExistentFolder 211ms 
[238]   utest.AssertionError: _root_.os.watch.watch(Seq(wd / "does-not-exist"), onEvent = _ => ())
[238]   wd: os.Path = /Users/arturaz/work/os-lib/out/os/watch/jvm/3.3.5/test/test.dest/sandbox/out/scratch/watch/WatchTests/tests/nonExistentFolder
[238]     utest.asserts.Util$.assertError(Util.scala:20)
[238]     utest.asserts.Asserts$.interceptImpl(Asserts.scala:49)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$1$$anonfun$1(WatchTests.scala:60)
[238]     test.os.TestUtil$.prep$$anonfun$1(TestUtil.scala:92)
[238]     test.os.TestUtil$.mkDir(TestUtil.scala:64)
[238]     test.os.TestUtil$.prep(TestUtil.scala:93)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$1(WatchTests.scala:60)
[238] #
[238] # A fatal error has been detected by the Java Runtime Environment:
[238] #
[238] #  SIGSEGV (0xb) at pc=0x000000018409957c, pid=33912, tid=4355
[238] #
[238] # JRE version: OpenJDK Runtime Environment Temurin-21.0.1+12 (21.0.1+12) (build 21.0.1+12-LTS)
[238] # Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (21.0.1+12-LTS, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
[238] # Problematic frame:
[238] # C  [CoreFoundation+0x15057c]  __CFCheckCFInfoPACSignature+0x4
[238] #
[238] # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
[238] #
[238] # An error report file with more information is saved as:
[238] # /Users/arturaz/work/os-lib/out/os/watch/jvm/3.3.5/test/test.dest/sandbox/hs_err_pid33912.log
[238] [0.663s][warning][os] Loading hsdis library failed
[238] #
[238] # If you would like to submit a bug report, please visit:
[238] #   https://github.com/adoptium/adoptium-support/issues
[238] # The crash happened outside the Java Virtual Machine in native code.
[238] # See problematic frame for where to report the bug.
[238] #

This one randomly fails on main as well:

[238] X test.os.watch.WatchTests.openClose.manyTimes 4668ms 
[238]   java.util.concurrent.TimeoutException: no file system changes detected within 3 seconds
[238]     test.os.watch.WatchTests$.testOpenClose$1$$anonfun$1(WatchTests.scala:354)
[238]     scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[238]     scala.collection.immutable.Range.foreach(Range.scala:192)
[238]     test.os.watch.WatchTests$.testOpenClose$1(WatchTests.scala:333)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$5$$anonfun$2$$anonfun$1(WatchTests.scala:367)
[238]     scala.runtime.function.JProcedure1.apply(JProcedure1.java:15)
[238]     scala.runtime.function.JProcedure1.apply(JProcedure1.java:10)
[238]     test.os.TestUtil$.prep$$anonfun$1(TestUtil.scala:92)
[238]     test.os.TestUtil$.mkDir(TestUtil.scala:64)
[238]     test.os.TestUtil$.prep(TestUtil.scala:93)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$5$$anonfun$2(WatchTests.scala:60)

This one fails because the behaviour has changed:

[238] X test.os.watch.WatchTests.closeIsSafeToInvokeMultipleTimes 210ms 
[238]   java.lang.AssertionError: assertion failed
[238]     scala.runtime.Scala3RunTime$.assertFailed(Scala3RunTime.scala:11)
[238]     os.watch.FSEventsWatcher.close(FSEventsWatcher.scala:101)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$6$$anonfun$1(WatchTests.scala:379)
[238]     scala.runtime.function.JProcedure1.apply(JProcedure1.java:15)
[238]     scala.runtime.function.JProcedure1.apply(JProcedure1.java:10)
[238]     test.os.TestUtil$.mkDir(TestUtil.scala:64)
[238]     test.os.watch.WatchTests$.$init$$$anonfun$1$$anonfun$6(WatchTests.scala:60)

This should fix the issue witnessed in mill (com-lihaoyi/mill#5225)

@lihaoyi
Copy link
Member

lihaoyi commented Jun 22, 2025

Can you mention in the PR description how this was tested? Manual testing or one-off testing is fine as well, just should be documented for posterity

@arturaz
Copy link
Contributor Author

arturaz commented Jun 22, 2025

Can you mention in the PR description how this was tested? Manual testing or one-off testing is fine as well, just should be documented for posterity

I assumed that the diff in os/watch/test/src/WatchTests.scala is sufficient, not sure what more to say about it :)

@lihaoyi
Copy link
Member

lihaoyi commented Jun 22, 2025

@arturaz I'd like to know which of those tests are:

  1. Tests that pass before and after this PR, i.e. being stricter about testing existing code
  2. Tests that fail before this PR and pass after it, i.e. you actually fixed something and added a regression test for the fix

These two categories of tests have very different properties, and without such a distinction it is difficult for a reviewer to understand what you are actually testing

@lihaoyi
Copy link
Member

lihaoyi commented Jun 22, 2025

Also, given that this is meant to be used in Mill, it's worth mentioning if any publishLocal smoke testing was done just to make sure manual usage in Mill or the Mill watch-source-input integration tests work with this change

@arturaz
Copy link
Contributor Author

arturaz commented Jun 22, 2025

Updated the comment.

./mill integration.invalidation[watch-source-input].local.nodaemon.testForked works fine on Linux but seems to hang/run very slowly on Mac with these changes.

[4776-1] -------------------------------- Running Tests --------------------------------
[4776-0] Copying integration test sources from /Users/arturaz/work/mill/integration/invalidation/watch-source-input/resources to /Users/arturaz/work/mill/out/integration/invalidation/watch-source-input/local/nodaemon/testForked.dest/worker-0/sandbox/run-1
[4776-1] Copying integration test sources from /Users/arturaz/work/mill/integration/invalidation/watch-source-input/resources to /Users/arturaz/work/mill/out/integration/invalidation/watch-source-input/local/nodaemon/testForked.dest/worker-1/sandbox/run-1
[4776-1] Copying integration test sources from /Users/arturaz/work/mill/integration/invalidation/watch-source-input/resources to /Users/arturaz/work/mill/out/integration/invalidation/watch-source-input/local/nodaemon/testForked.dest/worker-1/sandbox/run-2
[4776-0] Copying integration test sources from /Users/arturaz/work/mill/integration/invalidation/watch-source-input/resources to /Users/arturaz/work/mill/out/integration/invalidation/watch-source-input/local/nodaemon/testForked.dest/worker-0/sandbox/run-2
[4776/4776] ============================== integration.invalidation[watch-source-input].local.nodaemon.testForked ============================== 183s
[4776] integration.invalidation[watch-source-input].local.nodaemon.testForked 179s 0/2 completed.
[4776-0] 179s mill.integration.WatchSourceTests 179s
[4776-1] 179s mill.integration.WatchInputTests 179s

Got to run now, will investigate later.

@arturaz arturaz marked this pull request as draft June 22, 2025 11:55
@arturaz
Copy link
Contributor Author

arturaz commented Jun 24, 2025

Fixed the issue on macos, now mill's integration tests pass.

[6599-1] + mill.integration.WatchInputTests.input.show 12599ms  
[6599-1] Tests: 2, Passed: 2, Failed: 0
[6599-0] + mill.integration.WatchSourceTests.sources.show 14122ms  
[6599-0] Tests: 2, Passed: 2, Failed: 0
[6599/6599] ============================== integration.invalidation[watch-source-input].packaged.nodaemon.testForked ============================== 30s

The problem was that if you passed filter that filtered out the sentinel files that we created and thus os.watch threw an exception.

@arturaz arturaz marked this pull request as ready for review June 24, 2025 14:56
@lihaoyi lihaoyi merged commit 6edd90e into com-lihaoyi:main Jun 25, 2025
7 of 8 checks passed
@lihaoyi
Copy link
Member

lihaoyi commented Jun 25, 2025

tagged this as 0.11.0-M10, should be available once this job completes https://github.com/com-lihaoyi/os-lib/actions/runs/15864572422

@lefou
Copy link
Member

lefou commented Jun 25, 2025

I think the correct tag is 0.11.5-M10

@lefou lefou added this to the 0.11.5 milestone Jun 25, 2025
lihaoyi added a commit to com-lihaoyi/mill that referenced this pull request Jun 26, 2025
lihaoyi added a commit to com-lihaoyi/mill that referenced this pull request Jun 26, 2025
This should let us dogfood
com-lihaoyi/os-lib#398 and see if
#5225 still occurs
lihaoyi pushed a commit to com-lihaoyi/mill that referenced this pull request Jun 26, 2025
While debugging com-lihaoyi/os-lib#398 I
encountered a lackluster developer experience when running `WatchTests`.

- It took ages for the tests to fail.
- When the tests failed, the amount of context was very low.

This PR improves on both of those areas:
- `WatchTests` timeout is now `15s` instead of `120s` when running not
on CI.
- If the test fails the `eval` context (things like working directory,
the command line for eval and environment variables) is printed out.

To achieve that, I introduced:
- `prepEval` - as `eval` but doesn't actually execute immediatelly.
- `withTestClues` - enhances `utest.AssertionError` thrown in the block
with the given `TestValues`

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@arturaz arturaz deleted the improvement/os-watch-test-coverage branch July 7, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants